64 research outputs found

    Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA

    Get PDF
    Optical Quadrature Microscopy (OQM) is a process which uses phase data to capture information about the sample being studied. OQM is part of an imaging framework developed by the Optical Science Laboratory at Northeastern University. In one particular application of interest, the framework is used to extract phase information from the image of an embryo to determine embryo viability. Phase Unwrapping is the process of reconstructing the real phase shift (propagation delay) of a sample from the measured “wrapped“ representation which is between −π and +π. Unwrapping can be done using the Minimum L P Norm Phase Unwrap algorithm. Images are first preprocessed using an Affine Transform before they are unwrapped. Both of these steps are time consuming and would benefit greatly from parallelization and acceleration. Faster processing would lower many research barriers (in terms of throughpu

    Verifying a logic synthesis tool in nuprl: a case study in software veri cation

    Get PDF
    Abstract. We have proved a logic synthesis tool with the Nuprl proof development system. The logic synthesis tool, Pbs, implements the weak division algorithm, and is part of the Bedroc hardware synthesis system. Our goal was to develop a proven and usable implementation of a hardware synthesis tool. Pbs consists of approximately 1000 lines of code implemented in a functional subset of Standard ML. The program was verified by embedding this subset of SML in Nuprl and then verifying the correctness of the implementation of Pbs in Nuprl. In the process of doing the proof we learned many lessons which can be applied to efforts in verifying functional software. In particular, we were able to safely perform several optimizations to the program. In addition, we have invested effort into verifying software which will be used many times, rather than verifying the output of that software each time the program is used. The work required to verify hardware design tools and other similar software is worthwhile because the results of the proofs will be used many times

    Digital Pre-distortion Implemented Using FPGA

    Get PDF
    Massive-MIMO and beamforming techniques have long been proposed as a means of increasing cellular network capacity and improving signal to interference ratio performance. The implementation of such systems requires a large number of signal transmission paths. To realize this, a distributed array of power amplifiers (PAs) is likely to be needed. These PAs will possess similar, but unique, characteristics which will alter over time independently due to temperature drift and component ageing. In order to operate all PAs in both a linear and efficient fashion a linearisation technique, such as Digital Pre-Distortion (DPD), must be used. DPD algorithms benefit from reconfigurability, low latency and power efficiency, all traits associated with Field Programmable Gate Arrays (FPGAs). This demonstration shows how an FPGA, specifically a ZYNQ System on a Chip (SoC), can be used in tandem with a transceiver board, the FMCOMMS2, to implement a DPD system

    Memory Traffic and Data Cache Behavior of an MPEG-2 Software Decoder

    No full text
    We investigate the impact of multimedia applications on the cache behavior of desktop systems. Specifically, we consider the memory bandwidth and data cache challenges associated with MPEG-2 software decoding. Recent extensions to instruction set architectures, including Intel's MMX, address the computational aspects of MPEG decoding. The large amount of data traffic generated, however, has received little attention. Standard data caches consistently generate an excess of cache-memory traffic. Varying basic cache parameters only reduces traffic to double the minimum required at best. Incremental changes in cache size have a negligible effect for most feasible values. Increasing set associativity yields rapidly diminishing returns, and manipulating line size is similarly unproductive. Achieving higher efficiency requires understanding the composition and behavior of the decoder data set. We present a model of MPEG-2 decoder memory behavior and describe how to exploit this knowledge to m..

    Floating-Point Division and Square Root: Choosing the Right Implementation

    No full text
    this paper is to clarify and evaluate the implementation tradeoffs at the FPU level, thus enabling designers to make informed decisions. Division and square root have long been considered minor, bothersome members of the floating-point family. Microprocessor designers frequently perceive them as infrequent, low-priority operations, barely worth the trouble of implementing; design effort and chip resources are allocated accordingly. The survey of microprocessor FPU performance in Table 1 shows some of the uneven results of this philosophy. While multiplication requires from 2 to 5 machine cycles, division latencies range from 9 to 60. The variation is even greater for square root, which is not supported in hardware in several cases. This data hints at but mostly conceals the significant variation in algorithms and topologies among the different implementations. The error in the Intel Pentium floating-point unit, and the accompanying publicity and $475 million write-off illustrate some of the hazards of an incorrect division implementation [8]. But correctness is not enough; low performance causes enough problems of its own. Even though divide and square root are relatively infrequent operations in most applications, they are indispensable, particularly in many scientific programs. Compiler optimizations tend to increase the frequency of these operations [11]; poor implementations disproportionatelypenalize code which uses them at all [10]. Furthermore, as the latency gap grows between addition and multiplication on the one hand and divide/square root on the other, the latter increasingly become performance bottlenecks [14]. Programmers have attempted to get around this problem by rewriting algorithms to avoid divide/square root operations, but the resulting code generall..

    A Theorem Proving Based Methodology for Software Verification

    Full text link
    We have developed an effective methodology for using a proof development system to prove properties about functional programs. This methodology includes techniques such as hiding implementation details and using higher order theorems to structure proofs and aid in abstract reasoning. The methodology was discovered and refined while verifying a logic synthesis tool with the Nuprl proof development system. The logic synthesis tool, PbsPbs, implements the weak division algorithm. PbsPbs consists of approximately 1000 lines of code implemented in a functional subset of Standard ML. It is a proven and usable implementation of a hardware synthesis tool. The program was verified by embedding the subset of SML in Nuprl and then verifying the correctness of the implementation of PbsPbs in the Nuprl logic
    • …
    corecore